21 research outputs found
Component-based Attention for Large-scale Trademark Retrieval
The demand for large-scale trademark retrieval (TR) systems has significantly
increased to combat the rise in international trademark infringement.
Unfortunately, the ranking accuracy of current approaches using either
hand-crafted or pre-trained deep convolution neural network (DCNN) features is
inadequate for large-scale deployments. We show in this paper that the ranking
accuracy of TR systems can be significantly improved by incorporating hard and
soft attention mechanisms, which direct attention to critical information such
as figurative elements and reduce attention given to distracting and
uninformative elements such as text and background. Our proposed approach
achieves state-of-the-art results on a challenging large-scale trademark
dataset.Comment: Fix typos related to authors' informatio
MTRNet: A Generic Scene Text Eraser
Text removal algorithms have been proposed for uni-lingual scripts with
regular shapes and layouts. However, to the best of our knowledge, a generic
text removal method which is able to remove all or user-specified text regions
regardless of font, script, language or shape is not available. Developing such
a generic text eraser for real scenes is a challenging task, since it inherits
all the challenges of multi-lingual and curved text detection and inpainting.
To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet
is a conditional adversarial generative network (cGAN) with an auxiliary mask.
The introduced auxiliary mask not only makes the cGAN a generic text eraser,
but also enables stable training and early convergence on a challenging
large-scale synthetic dataset, initially proposed for text detection in real
scenes. What's more, MTRNet achieves state-of-the-art results on several
real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without
being explicitly trained on this data, outperforming previous state-of-the-art
methods trained directly on these datasets.Comment: Presented at ICDAR2019 Conferenc
Learning Test-time Data Augmentation for Image Retrieval with Reinforcement Learning
Off-the-shelf convolutional neural network features achieve outstanding
results in many image retrieval tasks. However, their invariance is pre-defined
by the network architecture and training data. Existing image retrieval
approaches require fine-tuning or modification of the pre-trained networks to
adapt to the variations in the target data. In contrast, our method enhances
the invariance of off-the-shelf features by aggregating features extracted from
images augmented with learned test-time augmentations. The optimal ensemble of
test-time augmentations is learned automatically through reinforcement
learning. Our training is time and resources efficient, and learns a diverse
test-time augmentations. Experiment results on trademark retrieval (METU
trademark dataset) and landmark retrieval (Oxford5k and Paris6k scene datasets)
tasks show the learned ensemble of transformations is effective and
transferable. We also achieve state-of-the-art MAP@100 results on the METU
trademark dataset
Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models
Heatmaps are widely used to interpret deep neural networks, particularly for
computer vision tasks, and the heatmap-based explainable AI (XAI) techniques
are a well-researched topic. However, most studies concentrate on enhancing the
quality of the generated heatmap or discovering alternate heatmap generation
techniques, and little effort has been devoted to making heatmap-based XAI
automatic, interactive, scalable, and accessible. To address this gap, we
propose a framework that includes two modules: (1) context modelling and (2)
reasoning. We proposed a template-based image captioning approach for context
modelling to create text-based contextual information from the heatmap and
input data. The reasoning module leverages a large language model to provide
explanations in combination with specialised knowledge. Our qualitative
experiments demonstrate the effectiveness of our framework and heatmap
captioning approach. The code for the proposed template-based heatmap
captioning approach will be publicly available
A facile solid-state heating method for preparation of poly(3,4-ethelenedioxythiophene)/ZnO nanocomposite and photocatalytic activity
Poly(3,4-ethylenedioxythiophene)/zinc oxide (PEDOT/ZnO) nanocomposites were prepared by a simple solid-state heating method, in which the content of ZnO was varied from 10 to 20 wt%. The structure and morphology of the composites were characterized by Fourier transform infrared (FTIR) spectroscopy, ultraviolet-visible (UV-vis) absorption spectroscopy, X-ray diffraction (XRD), and transmission electron microscopy (TEM). The photocatalytic activities of the composites were investigated by the degradation of methylene blue (MB) dye in aqueous medium under UV light and natural sunlight irradiation. The FTIR, UV-vis, and XRD results showed that the composites were successfully synthesized, and there was a strong interaction between PEDOT and nano-ZnO. The TEM results suggested that the composites were a mixture of shale-like PEDOT and less aggregated nano-ZnO. The photocatalytic activity results indicated that the incorporation of ZnO nanoparticles in composites can enhance the photocatalytic efficiency of the composites under both UV light and natural sunlight irradiation, and the highest photocatalytic efficiency under UV light (98.7%) and natural sunlight (96.6%) after 5 h occurred in the PEDOT/15wt%ZnO nanocomposite
MTRNet++: One-stage Mask-based Scene Text Eraser
A precise, controllable, interpretable and easily trainable text removal
approach is necessary for both user-specific and large-scale text removal
applications. To achieve this, we propose a one-stage mask-based text
inpainting network, MTRNet++. It has a novel architecture that includes
mask-refine, coarse-inpainting and fine-inpainting branches, and attention
blocks. With this architecture, MTRNet++ can remove text either with or without
an external mask. It achieves state-of-the-art results on both the Oxford and
SCUT datasets without using external ground-truth masks. The results of
ablation studies demonstrate that the proposed multi-branch architecture with
attention blocks is effective and essential. It also demonstrates
controllability and interpretability.Comment: This paper is under CVIU review (after major revision
Missing ingredients in optimising large-scale image retrieval with deep features
This thesis applies advanced image processing and deep machine learning techniques to solve the challenges of large-scale image retrieval. Solutions are provided to overcome key obstacles in real-world large-scale image retrieval applications by introducing unique methods for making deep learning systems more reliable and efficient. The outcome of the research is useful for several image retrieval applications including patent search, and trademark and logo infringement analysis
METU dataset: A big dataset for benchmarking trademark retrieval
Trademark retrieval (TR) is the problem of retrieving similar trademarks (logos) for a query, and the main aim is to detect copyright infringements in trademarks. Since there are millions of companies worldwide, automatically retrieving similar trademarks has become an important problem, and currently, checking trademark infringements is mostly performed manually by humans. However, although there have been many attempts for automated TR, as also acknowledged in the community, the problem is largely unsolved. One of the main reasons for that is the unavailability of a publicly available comprehensive dataset that includes the various challenges of the TR problem. In this article, we propose and introduce a large dataset composed of more than 930,000 trademarks, and evaluate the existing approaches in the literature on this dataset. We show that the existing methods are far from being useful in such a challenging dataset, and we hope that the dataset can facilitate the development of better methods to make progress in the performance of trademark retrieval systems
Noisy Uyghur Text Normalization
Uyghur is the second largest and most actively used social media language in China. However, a non-negligible part of Uyghur text appearing in social media is unsystematically written with the Latin alphabet, and it continues to increase in size. Uyghur text in this format is incomprehensible and ambiguous even to native Uyghur speakers. In addition, Uyghur texts in this form lack the potential for any kind of advancement for the NLP tasks related to the Uyghur language. Restoring and preventing noisy Uyghur text written with unsystematic Latin alphabets will be essential to the protection of Uyghur language and improving the accuracy of Uyghur NLP tasks. To this purpose, in this work we propose and compare the noisy channel model and the neural encoderdecoder model as normalizing methods. </p
A Large-scale Dataset and Benchmark for Similar Trademark Retrieval
Trademark retrieval (TR) has become an important yet challenging problem due
to an ever increasing trend in trademark applications and infringement
incidents. There have been many promising attempts for the TR problem, which,
however, fell impracticable since they were evaluated with limited and mostly
trivial datasets. In this paper, we provide a large-scale dataset with
benchmark queries with which different TR approaches can be evaluated
systematically. Moreover, we provide a baseline on this benchmark using the
widely-used methods applied to TR in the literature. Furthermore, we identify
and correct two important issues in TR approaches that were not addressed
before: reversal of contrast, and presence of irrelevant text in trademarks
severely affect the TR methods. Lastly, we applied deep learning, namely,
several popular Convolutional Neural Network models, to the TR problem. To the
best of the authors, this is the first attempt to do so